[% setvar title docs/pdds/pdd07_codingstd.pod - Conventions and Guidelines for Parrot Source Code %]
<div id="archive-notice">
    <h3>This file is part of the Perl 6 Archive</h3>
    <p>To see what is currently happening visit <a href="http://www.perl6.org/">http://www.perl6.org/</a></p>
</div>
<div class='pod'>
<a name='NAME'></a><h1>NAME</h1>
<p>docs/pdds/pdd07_codingstd.pod - Conventions and Guidelines for Parrot Source Code</p>
<a name='ABSTRACT'></a><h1>ABSTRACT</h1>
<p>This document describes the various rules, guidelines and advice for
those wishing to contribute to the source code of Parrot, in such areas
as code structure, naming conventions, comments etc.</p>
<a name='DESCRIPTION'></a><h1>DESCRIPTION</h1>
<p>One of the criticisms of Perl 5 is that its source code is
impenetrable to newcomers, due to such things as inconsistent or
obscure variable naming conventions, lack of comments in the source
code, and so on.  We don't intend to make the same mistake when writing
Parrot. Hence this document.</p>
<p>We define three classes of conventions. Those that say <i>must</i> are
mandatory, and code will not be accepted (apart from in exceptional
circumstances) unless it follows these rules. Those that say <i>should</i>
are strong guidelines that should normally be followed unless there
is a sensible reason to do otherwise.  Finally, those that say <i>may</i>,
are tentative suggestions to be used at your discretion.</p>
<p>Note this particular PDD makes some recommendations that are
specific to the C programming language. This does not preclude Parrot
(or Perl 6) being implemented in other languages, but in this case,
additional PDDs may need to be authored for the extra language-specific
features.</p>
<a name='IMPLEMENTATION'></a><h1>IMPLEMENTATION</h1>
<a name='Coding style'></a><h2>Coding style</h2>
<p>The following <i>must</i> apply:</p>
<ul>
<li><a name='4 column indents for code and 2 column indents for nested CPP #directives. All indentation must consist of spaces, no tabs (for ease of patching).'></a>4 column indents for code and 2 column indents for nested CPP #directives.
All indentation must consist of spaces, no tabs (for ease of patching).</li>
<p>There are two exceptions to the CPP indenting- neither PARROT_IN_CORE nor
the outermost _GUARD #ifdefs cause the level of indenting to increase.</p>
<p>To ensure that tabs aren't inadvertently used for indentation, the following
boilerplate code must appear at the bottom of each source file. (This
rule may be rescinded if I'm ever threatened with a lynching....)</p>
<pre>   /*
    * Local variables:
    * c-indentation-style: bsd
    * c-basic-offset: 4
    * indent-tabs-mode: nil
    * End:
    *
    * vim: expandtab shiftwidth=4:
    */</pre>
<li><a name='Any other tabs are assumed to be on an 8-character boundary.'></a>Any other tabs are assumed to be on an 8-character boundary.</li>
<li><a name='ANSI C function prototypes'></a>ANSI C function prototypes</li>
<li><a name='&quot;K&amp;R&quot; style for indenting control constructs: ie the closing } should line up with the opening if etc.'></a>&quot;K&amp;R&quot; style for indenting control constructs: ie the closing <code>}</code> should
line up with the opening <code>if</code> etc.</li>
<li><a name='When a conditional spans multiple lines, the opening { must line up with the if or while, or be at the end-of-line otherwise.'></a>When a conditional spans multiple lines, the opening <code>{</code> must line up
with the <code>if</code> or <code>while</code>, or be at the end-of-line otherwise.</li>
<li><a name='Uncuddled elses: ie avoid } else {'></a>Uncuddled <code>else</code>s: ie avoid  <code>} else {</code></li>
<li><a name='No C++ style comments (//): some C compilers may choke on them'></a>No C++ style comments (<code>//</code>): some C compilers may choke on them</li>
<li><a name='Mark places that need to be revisited with XXX (and preferably your initials too), and revisit often!'></a>Mark places that need to be revisited with XXX (and preferably your
initials too), and revisit often!</li>
<li><a name='In function definitions, the name starts in column 0, with the return type on the previous line.'></a>In function definitions, the name starts in column 0, with the
return type on the previous line.</li>
<li><a name='However, in function declarations (in header files) the return type is kept on the same line.'></a>However, in function declarations (in header files) the return type is kept
on the same line.</li>
<li><a name='Variable names should be included for all function parameters in the function declarations. These names should match the parameters in the function definition.'></a>Variable names should be included for all function parameters in the
function declarations.  These names should match the parameters in the
function definition.</li>
<li><a name='Single space after keywords that are followed by (), eg return (x+y)*2, but no space between function name and following (), eg z = foo(x+y)*2'></a>Single space after keywords that are followed by <code>()</code>, eg
<code>return (x+y)*2</code>, but no space between function name and following <code>()</code>,
eg <code>z = foo(x+y)*2</code></li>
</ul>
<p>The following <i>should</i> apply</p>
<ul>
<li><a name='Do not exceed 79 columns'></a>Do not exceed 79 columns</li>
<li><a name='return foo; rather than return (foo);'></a><code>return foo;</code> rather than <code>return (foo);</code></li>
<li><a name='if (!foo) ... rather than if (foo == FALSE) ... etc.'></a><code>if (!foo) ...</code> rather than <code>if (foo == FALSE) ...</code> etc.</li>
<li><a name='Avoid assignments in conditionals, but if they're unavoidable, use Extra paren, e.g. if (a &amp;&amp; (b = c)) ...'></a>Avoid assignments in conditionals, but if they're unavoidable, use
Extra paren, e.g. <code>if (a &amp;&amp; (b = c)) ...</code></li>
<li><a name='Avoid double negatives, eg #ifndef NO_FEATURE_FOO'></a>Avoid double negatives, eg <code>#ifndef NO_FEATURE_FOO</code></li>
<li><a name='Binary operators should have a space on either side; parentheses should not have space immediately after the opening paren nor immediately before the closing paren, commas should have space after but not before, eg'></a>Binary operators should have a space on either side; parentheses should
not have space immediately after the opening paren nor immediately before
the closing paren, commas should have space after but not before, eg</li>
<pre>        x = (a + b) * f(c, d / e)</pre>
<li><a name='Long lines should be split before an operator so that it is immediately obvious when scanning the LHS of the code that this has happened, and two extra levels of indent should be used.'></a>Long lines should be split before an operator so that it is immediately
obvious when scanning the LHS of the code that this has happened, and two
extra levels of indent should be used.</li>
<li><a name='and/or split on matching parens, with the content on separate line(s) and with one extra indent:'></a>and/or split on matching parens, with the content on separate line(s) and
with one extra indent:</li>
<pre>    do_arbitrary_function(
        list_of_parameters_with_long_names, or_complex_subexpression(
            of_more_params, or_expressions + 1
        )
    );</pre>
</ul>
<p>To enforce the spacing, indenting, and bracing guidelines mentioned
above, the following arguments to GNU Indent should be used:</p>
<pre>   -kr -nce -sc -cp0 -l79 -lc79 -psl -nut -cdw -ncs -lps</pre>
<p>This expands out to:</p>
<ul>
<li><a name='-nbad'></a>-nbad</li>
<p>Do not force blank lines after declarations.</p>
<li><a name='-bap'></a>-bap</li>
<p>Force blank lines after procedure bodies.</p>
<li><a name='-bbo'></a>-bbo</li>
<p>Prefer to break long lines before boolean operators.</p>
<li><a name='-nbc'></a>-nbc</li>
<p>Do not force newlines after commas in declarations</p>
<li><a name='-br'></a>-br</li>
<p>Put braces on line with if, etc.</p>
<li><a name='-brs'></a>-brs</li>
<p>Put braces on struct declaration line.</p>
<li><a name='-c33'></a>-c33</li>
<p>Put comments to the right of code in column 33 (not recommended)</p>
<li><a name='-cd33'></a>-cd33</li>
<p>Put declaration comments to the right of code in column 33</p>
<li><a name='-ncdb'></a>-ncdb</li>
<p>Do not put comment delimiters on blank lines.</p>
<li><a name='-nce'></a>-nce</li>
<p>Do not cuddle } and else.</p>
<li><a name='-cdw'></a>-cdw</li>
<p>Do cuddle do { } while.</p>
<li><a name='-ci4'></a>-ci4</li>
<p>Continuation indent of 4 spaces</p>
<li><a name='-cli0'></a>-cli0</li>
<p>Case label indent of 0 spaces</p>
<li><a name='-ncs'></a>-ncs</li>
<p>Do not put a space after a cast operator.</p>
<li><a name='-d0'></a>-d0</li>
<p>Set indentation of comments not to the right of code to 0 spaces.</p>
<li><a name='-di1'></a>-di1</li>
<p>Put declaration variables 1 space after their types</p>
<li><a name='-nfc1'></a>-nfc1</li>
<p>Do not format comments in the first column as  normal.</p>
<li><a name='-nfca'></a>-nfca</li>
<p>Do not format any comments</p>
<li><a name='-hnl'></a>-hnl</li>
<p>Prefer to break long lines at the position of newlines in the input.</p>
<li><a name='-i4'></a>-i4</li>
<p>4-space indents</p>
<li><a name='-ip0'></a>-ip0</li>
<p>Indent parameter types in old-style function definitions by 0 spaces.</p>
<li><a name='-l79'></a>-l79</li>
<p>maximum line length for non-comment lines is 79 spaces.</p>
<li><a name='-lc79'></a>-lc79</li>
<p>maximum line length for comment lines is 79 spaces.</p>
<li><a name='-lp'></a>-lp</li>
<p>maximum line length for non-comment lines is 79 spaces.</p>
<li><a name='-npcs'></a>-npcs</li>
<p>Do not put a space after the function in function calls.</p>
<li><a name='-nprs'></a>-nprs</li>
<p>Do not put a space after every ´(´ and before every ´)´.</p>
<li><a name='-saf'></a>-saf</li>
<p>Put a space after each for.</p>
<li><a name='-sai'></a>-sai</li>
<p>Put a space after each if.</p>
<li><a name='-saw'></a>-saw</li>
<p>Put a space after each while.</p>
<li><a name='-sc'></a>-sc</li>
<p>Put the `*´ character at the left of comments.</p>
<li><a name='-nsob'></a>-nsob</li>
<p>Do not swallow optional blank lines.</p>
<li><a name='-nss'></a>-nss</li>
<p>Do not force a space before the semicolon  after  certain statements</p>
<li><a name='-nut'></a>-nut</li>
<p>Use spaces instead of tabs.</p>
<li><a name='-lps'></a>-lps</li>
<p>Leave space between `#´ and preprocessor directive.</p>
<li><a name='-psl'></a>-psl</li>
<p>Put the type of a procedure on the line before its name. (.c files), or</p>
<li><a name='-npsl'></a>-npsl</li>
<p>Leave a procedure declaration's return type alone (.h files)</p>
</ul>
<p>Please note that it is also necessary to include all typedef types
with the &quot;-T&quot; option to ensure that everything is formatted properly.</p>
<p>A script (<b><i>tools/dev/run_indent.pl</i></b>) is provided which runs <b><i>indent</i></b>
properly automatically.</p>
<a name='Naming conventions'></a><h2>Naming conventions</h2>
<ul>
<li><a name='Subsystems and APIs'></a>Subsystems and APIs</li>
<p>The Parrot core will be split into a number of subsystems, each with an
associated API. For the purposes of naming files, data structures, etc,
each subsystem will be assigned a short nickname, eg <i>pmc</i>, <i>gc</i>, <i>io</i>.
All code within the core will belong to a subsystem; miscellaneous code
with no obvious home will be placed in the special subsystem called
<i>misc</i>.</p>
<li><a name='Filenames'></a>Filenames</li>
<p>Filenames must be assumed to be case-insensitive, in the sense that
that you may not have two different files called <b><i>Foo</i></b> and <b><i>foo</i></b>. Normal
source-code filenames should be all lower-case; filenames with
upper-case letters in them are reserved for notice-me-first files such
as <b><i>README</i></b>, and for files which need some sort of pre-processing applied
to them or which do the preprocessing - eg a script <b><i>foo.SH</i></b> might
read <b><i>foo.TEMPLATE</i></b> and output <b><i>foo.c</i></b>.</p>
<p>The characters making up filenames must be chosen from the ASCII set
A-Z,a-z,0-9 plus .-_</p>
<p>An underscore should be used to separate words rather than a hyphen
(-).  A file should not normally have more than a single '.' in it, and
this should be used to denote a suffix of some description. The
filename must still be unique if the main part is truncated to 8
characters and any suffix truncated to 3 characters. Ideally, filenames
should restricted to 8.3 in the first place, but this is not
essential.</p>
<p>Each subsystem <i>foo</i> should supply the following files. This arrangement
is based on the assumption that each subsystem will - as far as is
practical - present an opaque interface to all other subsystems within the
core, as well as to extensions and embeddings.</p>
<ul>
<li><a name='foo.h'></a>foo.h</li>
<p>This contains all the declarations needed for external users of that
API (and nothing more), ie it defines the API. It is permissible for
the API to include different or extra functionality when used by other
parts of the core, compared with its use in extensions and embeddings.
In this case, the extra stuff within the file is enabled by testing for
the macro <code>PERL_IN_CORE</code>.</p>
<li><a name='foo_private.h'></a>foo_private.h</li>
<p>This contains declarations used internally by that subsystem, and which
must only be included within source files associated the subsystem.
This file defines the macro <code>PERL_IN_FOO</code> so that code knows when it is
being used within that subsystem. The file will also contain all the
'convenience' macros used to define shorter working names for functions
without the perl prefix (see below).</p>
<li><a name='foo_globals.h'></a>foo_globals.h</li>
<p>This file contains the declaration of a single structure containing the
private global variables used by the subsystem (see the section on
globals below for more details).</p>
<li><a name='foo.sym'></a>foo.sym</li>
<p>This file (format and contents TBD) contains information about global
symbols associated with the subsystem, and may be used by scripts to
auto-generate such stuff as the include files mentioned above, linker
map tables, documentation etc, based upon portability and extensibility
requirements.</p>
<li><a name='foo_bar.[ch] etc'></a>foo_bar.[ch] etc</li>
<p>All other source files associated with the subsystem will have the prefix
foo_</p>
</ul>
<li><a name='Header Files'></a>Header Files</li>
<p>All .h files should include the following &quot;guards&quot; to prevent
multiple-inclusion:</p>
<pre>    /* file header comments */

    #if !defined(PARROT_&lt;FILENAME&gt;_H_GUARD)
    #define PARROT_&lt;FILENAME&gt;_H_GUARD

    /* body of file */

    #endif /* PARROT_&lt;FILENAME&gt;_H_GUARD */</pre>
<li><a name='Names of code entities'></a>Names of code entities</li>
<p>Code entities such as variables, functions, macros etc (apart from strictly
local ones) should all follow these general guidelines.</p>
<ul>
<li><a name='Multiple words or components should be separated with underscores rather than using tricks such as capitalization, eg new_foo_bar rather than NewFooBar or (gasp) newfoobar.'></a>Multiple words or components should be separated with underscores rather
than using tricks such as capitalization, eg <code>new_foo_bar</code> rather than
<code>NewFooBar</code> or (gasp) <code>newfoobar</code>.</li>
<li><a name='The names of entities should err on the side of verbosity, eg create_foo_from_bar() in preference to ct_foo_bar(). Avoid cryptic abbreviations wherever possible.'></a>The names of entities should err on the side of verbosity, eg
<code>create_foo_from_bar()</code> in preference to <code>ct_foo_bar()</code>. Avoid cryptic
abbreviations wherever possible.</li>
<li><a name='All entities should be prefixed with the name of the subsystem they appear in, eg pmc_foo(), struct io_bar. They should be further prefixed with the word 'perl' if they have external visibility or linkage, namely, non-static functions, plus macros and typedefs etc which appear in public header files. (Global variables are handled specially; see below.) For example:'></a>All entities should be prefixed with the name of the subsystem they appear
in, eg <code>pmc_foo()</code>, <code>struct io_bar</code>. They should be further prefixed
with the word 'perl' if they have external visibility or linkage,
namely, non-static functions, plus macros and typedefs etc which appear
in public header files. (Global variables are handled specially; see below.)
For example:</li>
<pre>    perlpmc_foo()
    struct perlio_bar
    typedef struct perlio_bar Perlio_bar
    #define PERLPMC_readonly_TEST ...</pre>
<p>In the specific case of the use of global variables and functions
within a subsystem, convenience macros will be defined (in
foo_private.h) that allow use of the shortened name in the case of
functions (ie <code>pmc_foo()</code> instead of <code>perlpmc_foo()</code>), and hide the
real representation in the case of global variables.</p>
<li><a name='Variables and structure names should be all lower-case, eg pmc_foo.'></a>Variables and structure names should be all lower-case, eg <code>pmc_foo</code>.</li>
<li><a name='Structure elements should be all lower-case, and the first component of the name should incorporate the structure's name or an abbreviation of it.'></a>Structure elements should be all lower-case, and the first component of
the name should incorporate the structure's name or an abbreviation of it.</li>
<li><a name='Typedef names should be lower-case except for the first letter, eg Foo_bar. The exception to this is when the first component is a short abbreviation, in which case the whole first component may be made uppercase for readability purposes, eg IO_foo rather than Io_foo. Structures should generally be typedefed.'></a>Typedef names should be lower-case except for the first letter, eg
<code>Foo_bar</code>. The exception to this is when the first component is a
short abbreviation, in which case the whole first component may be made
uppercase for readability purposes, eg <code>IO_foo</code> rather than
<code>Io_foo</code>.  Structures should generally be typedefed.</li>
<li><a name='Macros should have their first component uppercase, and the majority of the remaining components should be likewise. Where there is a family of macros, the variable part can be indicated in lowercase, eg PMC_foo_FLAG, PMC_bar_FLAG, ....'></a>Macros should have their first component uppercase, and the majority
of the remaining components should be likewise. Where there is a family
of macros, the variable part can be indicated in lowercase, eg
<code>PMC_foo_FLAG</code>, <code>PMC_bar_FLAG</code>, ....</li>
<li><a name='A macro which defines a flag bit should be suffixed with _FLAG, eg PMC_readonly_FLAG (although you probably want to use an enum instead.)'></a>A macro which defines a flag bit should be suffixed with <code>_FLAG</code>, eg
<code>PMC_readonly_FLAG</code> (although you probably want to use an <code>enum</code>
instead.)</li>
<li><a name='A macro which tests a flag bit should be suffixed with _TEST, eg if (PMC_readonly_TEST(foo)) ...'></a>A macro which tests a flag bit should be suffixed with <code>_TEST</code>, eg
<code>if (PMC_readonly_TEST(foo)) ...</code></li>
<li><a name='A macro which sets a flag bit should be suffixed with _SET, eg PMC_readonly_SET(foo);'></a>A macro which sets a flag bit should be suffixed with <code>_SET</code>, eg
<code>PMC_readonly_SET(foo);</code></li>
<li><a name='A macro which clears a flag bit should be suffixed with _CLEAR, eg PMC_readonly_CLEAR(foo);'></a>A macro which clears a flag bit should be suffixed with <code>_CLEAR</code>, eg
<code>PMC_readonly_CLEAR(foo);</code></li>
<li><a name='A macro defining a mask of flag bits should be suffixed with _MASK, eg foo &amp;= ~PMC_STATUS_MASK (but see notes on extensibility below).'></a>A macro defining a mask of flag bits should be suffixed with <code>_MASK</code>,
eg <code>foo &amp;= ~PMC_STATUS_MASK</code> (but see notes on extensibility below).</li>
<li><a name='Macros can be defined to cover common flag combinations, in which case they should have _SETALL, CLEARALL, _TESTALL or &lt;_TESTANY&gt; suffixes as appropriate, to indicate aggregate bits, eg PMC_valid_CLEARALL(foo)'></a>Macros can be defined to cover common flag combinations, in which case they
should have <code>_SETALL</code>, <code>CLEARALL</code>, <code>_TESTALL</code> or &lt;_TESTANY&gt; suffixes
as appropriate, to indicate aggregate bits, eg
<code>PMC_valid_CLEARALL(foo)</code></li>
<li><a name='A macro defining an auto-configuration value should be prefixed with HAS_, eg HAS_BROKEN_FLOCK, HAS_EBCDIC.'></a>A macro defining an auto-configuration value should be prefixed with
<code>HAS_</code>, eg <code>HAS_BROKEN_FLOCK</code>, <code>HAS_EBCDIC</code>.</li>
<li><a name='A macro indicating the compilation 'location' should be prefixed with IN_, eg PERL_IN_CORE, PERL_IN_PMC, PERL_IN_X2P. Individual include file visitations should be marked with PERL_IN_FOO_H for file foo.h'></a>A macro indicating the compilation 'location' should be prefixed with
<code>IN_</code>, eg <code>PERL_IN_CORE</code>, <code>PERL_IN_PMC</code>, <code>PERL_IN_X2P</code>. Individual
include file visitations should be marked with <code>PERL_IN_FOO_H</code> for
file foo.h</li>
<li><a name='A macro indicating major compilation switches should be prefixed with USE_, eg PERL_USE_STDIO, USE_MULTIPLICITY.'></a>A macro indicating major compilation switches should be prefixed with
<code>USE_</code>, eg <code>PERL_USE_STDIO</code>, <code>USE_MULTIPLICITY</code>.</li>
<li><a name='A macro that may declare stuff and thus needs to be at the start of a block should be prefixed with DECL_, eg DECL_SAVE_STACK. Note that macros which implicitly declare and then use variables are strongly discouraged, unless it is essential for portability or extensibility. The following are in decreasing preference style-wise, but increasing preference extensibility-wise.'></a>A macro that may declare stuff and thus needs to be at the start of a
block should be prefixed with <code>DECL_</code>, eg <code>DECL_SAVE_STACK</code>. Note
that macros which implicitly declare and then use variables are strongly
discouraged, unless it is essential for portability or extensibility.
The following are in decreasing preference style-wise, but increasing
preference extensibility-wise.</li>
<pre>    { Stack sp = GETSTACK;  x = POPSTACK(sp) ... /* sp is an auto variable */
    { DECL_STACK(sp);  x = POPSTACK(sp); ... /* sp may or may not be auto */
    { DECL_STACK; x = POPSTACK; ... /* anybody's guess */</pre>
</ul>
<li><a name='Global Variables'></a>Global Variables</li>
<p>Global variables must never be accessed directly outside the subsystem
in which they are used. Some other method, such as accessor functions,
must be provided by that subsystem's API. (For efficiency the 'accessor
functions' may occasionally actually be macros, but then the rule still
applies in spirit at least).</p>
<p>All global variables needed for the internal use of a particular
subsystem should all be declared within a single struct called
foo_globals for subsystem foo. This structure's declaration is placed
in the file foo_globals.h. Then somewhere a single compound structure
will be declared which has as members the individual structures from
each subsystem. Instances of this structure are then defined as a
one-off global variable, or as per-thread instances, or whatever is
required.</p>
<p>[Actually, three separate structures may be required, for global,
per-interpreter and per-thread variables.]</p>
<p>Within an individual subsystem, macros are defined for each global
variable of the form <code>GLOBAL_foo</code> (the name being deliberately clunky).
So we might for example have the following macros:</p>
<pre>	/* perl_core.h or similar */

	#ifdef HAS_THREADS
	#  define GLOBALS_BASE (aTHX_-&gt;globals)
	#else
	#  define GLOBALS_BASE (Perl_globals)
	#endif

	/* pmc_private.h */

	#define GLOBAL_foo   GLOBALS_BASE.pmc.foo
	#define GLOBAL_bar   GLOBALS_BASE.pmc.bar
	... etc ...</pre>
</ul>
<a name='Code comments'></a><h2>Code comments</h2>
<p>The importance of good code documentation cannot be stressed enough.
To make your code understandable by others (and indeed by yourself when
you come to make changes a year later :-), the following conventions
apply to all source files.</p>
<ul>
<li><a name='Developer files'></a>Developer files</li>
<p>Each source file (eg a <b><i>foo.c</i></b> <b><i>foo.h</i></b> pair), should contain inline
POD documentation containing information on the
implementation decisions associated with the source file. (Note that
this is in contrast to PDDs, which describe design decisions). In addition,
more discussive documentation can be placed in <b><i>*.dev</i></b> files in the
<b><i>docs/dev</i></b> directory. This is
the place for mini-essays on how to avoid overflows in unsigned
arithmetic, or on the pros and cons of differing hash algorithms, and
why the current one was chosen, and how it works.</p>
<p>In principle, someone coming to a particular source file for the first
time should be able to read the inline documentation file and gain an immediate
overview of what the source file is for, the algorithms it implements,
etc.</p>
<p>The POD documentation should follow the standard POD layout:</p>
<ul>
<li><a name='Copyright'></a>Copyright</li>
<p>The Parrot copyright statement.</p>
<li><a name='CVS'></a>CVS</li>
<p>A CVS id string.</p>
<li><a name='NAME'></a>NAME</li>
<p>src/foo.c - Foo</p>
<li><a name='SYNOPSIS'></a>SYNOPSIS</li>
<p>When appropriate, some simple examples of usage.</p>
<li><a name='DESCRIPTION'></a>DESCRIPTION</li>
<p>A description of the contents of the file, how the implementation works,
data structures and algorithms, and anything that may be of interest to
your successors, eg benchmarks of differing hash algorithms, essays on
how to do integer arithmetic.</p>
<li><a name='HISTORY'></a>HISTORY</li>
<p>Record major changes to the file, eg &quot;we moved from a linked list to a
hash table implementation for storing Foos, as it was found to be much
faster&quot;.</p>
<li><a name='SEE ALSO'></a>SEE ALSO</li>
<p>Links to pages and books that may contain useful info relevant to the
stuff going on in the code - eg the book you stole the hash function
from.</p>
</ul>
<li><a name='Per-section comments'></a>Per-section comments</li>
<p>If there is a collection of functions, structures or whatever which are
grouped together and have a common theme or purpose, there should be a
general comment at the start of the section briefly explaining their
overall purpose. (Detailed essays should be left to the developer
file). If there is really only one section, then the top-of-file
comment already satisfies this requirement.</p>
<li><a name='Per-entity comments'></a>Per-entity comments</li>
<p>Every non-local named entity, be it a function, variable, structure,
macro or whatever, must have an accompanying comment explaining it's
purpose.  This comment must be in the special format described below,
in order to allow automatic extraction by tools - for example, to
generate per API man pages, <b>perldoc -f</b> style utilities and so on.</p>
<p>Often the comment need only be a single line explaining its purpose,
but sometimes more explanation may be needed. For example, &quot;return an
Integer Foo to its allocation pool&quot; may be enough to demystify the
function <code>del_I_foo()</code></p>
<p>Each comment should be of the form</p>
<pre>    /*

    =item C&lt;function(arguments)&gt;

    Description.

    =cut

    */</pre>
<p>This inline POD documentation is parsed to HTML by running:</p>
<pre>	% perl tools/docs/write_docs.pl -s</pre>
<li><a name='Optimizations'></a>Optimizations</li>
<p>Whenever code has deliberately been written in an odd way for
performance reasons, you should point this out - if nothing else, to
avoid some poor schmuck trying subsequently to replace it with something
'cleaner'.</p>
<pre>    /* The loop is partially unrolled here as it makes it a lot faster.
     * See the .dev file for the full details
     */</pre>
<li><a name='General comments'></a>General comments</li>
<p>While there is no need to go mad commenting every line of code, it is
immensely helpful to provide a &quot;running commentary&quot; every 10 or so
lines say; if nothing else, this makes it easy to quickly locate a
specific chunk of code. Such comments are particularly useful at the top
of each major branch, eg</p>
<pre>    if (FOO_bar_BAZ(**p+*q) &lt;= (r-s[FOZ &amp; FAZ_MASK]) || FLOP_2(z99)) {
	/* we're in foo mode: clean up lexicals */
	... (20 lines of gibberish) ...
    }
    else if (...) {
	/* we're in bar mode: clean up globals */
	... (20 more lines of gibberish) ...
    }
    else {
	/* we're in baz mode: self-destruct */
	....
    }</pre>
</ul>
<a name='Extensibility'></a><h2>Extensibility</h2>
<p>If Perl 5 is anything to go by, the lifetime of Perl 6 will be at least
seven years. During this period, the source code will undergo many
major changes never envisaged by its original authors - cf threads,
unicode in perl 5. To this end, your code should balance out the
assumptions that make things possible, fast or small, with the
assumptions that make it difficult to change things in future. This is
especially important for parts of the code which are exposed through
APIs - the requirements of src or binary compatibility for such things as
extensions can make it very hard to change things later on.</p>
<p>For example, if you define suitable macros to set/test flags in a
struct, then you can later add a second word of flags to the struct
without breaking source compatibility. (Although you might still break
binary compatibility if you're not careful.) Of the following two
methods of setting a common combination of flags, the second doesn't
assume that all the flags are contained within a single field:</p>
<pre>    foo-&gt;flags |= (FOO_int_FLAG | FOO_num_FLAG | FOO_str_FLAG);
    FOO_valid_value_SETALL(foo);</pre>
<p>Similarly, avoid using a char* (or {char*,length}) if it is feasible to
later use a PMC* at the same point: cf UTF-8 hash keys in Perl 5.</p>
<p>Of course, private code hidden behind an API can play more fast and
loose than code which gets exposed.</p>
<a name='Portability'></a><h2>Portability</h2>
<p>Related to extensibility is portability. Perl runs on many, many
platforms, and will no doubt be ported to ever more bizarre and obscure
ones over time.  You should never assume an operating system, processor
architecture, endian-ness, word size, or whatever. In particular, don't
fall into any of the following common traps:</p>
<p>Internal data types and their utility functions (especially for strings)
should be used over a bare char * whenever possible. Ideally there should
be no char * in the source anywhere, and no use of C's standard string
library.</p>
<p>Don't assume GNU C, and don't use any GNU extensions unless protected by
#ifdefs for non-GNU-C builds.</p>
<p>TBC ... Any contributions welcome !!!</p>
<a name='Performance'></a><h2>Performance</h2>
<p>We want Perl to be fast. Very fast. But we also want it to be portable
and extensible. Based on the 90/10 principle, (or 80/20, or 95/5,
depending on who you speak to), most performance is gained or lost in a
few small but critical areas of code. Concentrate your optimization
efforts there.</p>
<p>Note that the most overwhelmingly important factor in performance is in
choosing the correct algorithms and data structures in the first place.
Any subsequent tweaking of code is secondary to this. Also, any
tweaking that is done should as far as possible be platform
independent, or at least likely to cause speed-ups in a wide variety of
environments, and do no harm elsewhere. Only in exceptional
circumstances should assembly ever even be considered, and then only if
generic fallback code is made available that can still be used by all
other non-optimized platforms.</p>
<p>Probably the dominant factor (circa 2001) that effects processor
performance is the cache. Processor clock rates have increased far in
excess of main memory access rates, and the only way for the
processor to proceed without stalling is for most of the data items it
needs to be found to hand in the cache. It is reckoned that even a 2%
cache miss rate can cause a slowdown in the region of 50%. It is for
this reason that algorithms and data structures must be designed to be
'cache-friendly'.</p>
<p>A typical cache may have a block size of anywhere between 4 and 256
bytes.  When a program attempts to read a word from memory and the word
is already in the cache, then processing continues unaffected.
Otherwise, the processor is typically stalled while a whole contiguous
chunk of main memory is read in and stored in a cache block. Thus,
after incurring the initial time penalty, you then get all the memory
adjacent to the initially read data item for free.  Algorithms that make
use of this fact can experience quite dramatic speedups.  For example,
the following pathological code ran four times faster on my machine by
simply swapping <code>i</code> and <code>j</code>.</p>
<pre>    int a[1000][1000];

    ... (a gets populated) ...

    int i,j,k;
    for (i=0; i&lt;1000; i++) {
        for (j=0; j&lt;1000; j++) {
            k += a[j][i];
        }
    }</pre>
<p>This all boils down to: keep things near to each other that get
accessed at around the same time. (This is why the important
optimizations occur in data structure and algorithm design rather than
in the detail of the code.) This rule applies both to the layout of
different objects relative to each other, and to the relative
positioning of individual fields within a single structure.</p>
<p>If you do put an optimization in, time it on as many architectures as
you can, and be suspicious of it if it slows down on any of them! Perhaps
it will be slow on other architectures too (current and future).
Perhaps it wasn't so clever after all? If the optimization is platform
specific, you should probably put it in a platform-specific function in
a platform-specific file, rather than cluttering the main source with
zillions of #ifdefs.</p>
<p>And remember to document it.</p>
<p>Loosely speaking, Perl tends to optimism for speed rather than space,
so you may want to code for speed first, then tweak to reclaim some
space while not affecting performance.</p>
<a name='REFERENCES'></a><h1>REFERENCES</h1>
<p>The section on coding style is based on Perl5's <b><i>Porting/patching.pod</i></b>
by Daniel Grisinger. The section on naming conventions grew from some
suggestions by Paolo Molaro &lt;<a href='mailto:lupus@lettere.unipd.it'>lupus@lettere.unipd.it</a>&gt;. Other snippets
came from various P5Pers. The rest of it is probably my fault.</p>
<a name='VERSION'></a><h1>VERSION</h1>
<a name='CURRENT'></a><h2>CURRENT</h2>
<pre>   Maintainer: Dave Mitchell &lt;<a href='mailto:davem@fdgroup.com'>davem@fdgroup.com</a>&gt;
   Class: Internals
   PDD Number: 7
   Version: 1
   Status: Developing
   Last Modified: 6 August 2001
   PDD Format: 1
   Language: English</pre>
<a name='HISTORY'></a><h2>HISTORY</h2>
<p>Based on an earlier draft which covered only code comments.</p>
<a name='CHANGES'></a><h1>CHANGES</h1>
<p>None. First version</p>
</div>
